SECCION METODOLOGICA 

Psicologica (2008), 29, 189-203. 


Maximizing the information and validity of a linear 
composite in the factor analysis model for continuous 

item responses 

Pere J. Ferrando* 

'Rovira i Virgili' University (Spain) 

This paper develops results and procedures for obtaining linear composites 
of factor scores that maximize: (a) test information, and (b) validity with 
respect to external variables in the multiple factor analysis (FA) model. I 
treat FA as a multidimensional item response theory model, and use 
Ackerman’s multidimensional information approach based on maximum 
likelihood (ML) estimation of trait levels. This approach, when applied to 
the FA model, leads to particularly simple results as far as maximizing test 
information is concerned. Developments concerned with validity appear to 
be new, and I use ML results in the context of error-in-variables regression. 
Graphical procedures for representing both type of results are proposed. The 
developments are illustrated with two empirical examples in personality 
measurement. 


When using multidimensional instruments that measure related 
dimensions it is sometimes useful to obtain a single test score that 
represents the respondent. For example, in a multidimensional questionnaire 
that measures different facets of anxiety it can be of interest to obtain a 
single, general anxiety score. This score is generally obtained as a linear 
composite that has optimal properties in some sense. The topic of finding 
‘best’ linear composites has been widely researched in classical test theory, 
and the most usual composites are weighted combinations of the raw scores 
that maximize either the reliability, or the validity with respect to external 
variables (e.g. Gullicksen, 1950 chap. 20, Wang and Stanley 1970). 
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In multidimensional item response theory (MIRT) the problem of 
determining an optimal linear composite has been approached from the 
theory of maximum likelihood (ML) estimation and the related concept of 
test information. Ackerman (1994, 1996) proposed an index of 

multidimensional test information as well as a general approach for 
determining a linear composite in the trait space that maximizes the test 
information. Ackerman’s approach is equivalent to determining a linear 
composite whose ML estimate (MLE) has minimal conditional standard 
error of measurement (SEM). So far, Ackerman’s approach has been used 
in relation to standard nonlinear IRT models, and the results are relatively 
complex because in these models the information or the SEM varies 
generally with the trait levels. As far as the writer knows, the approach has 
not been considered in relation to external validity. 

In typical-response (personality and attitude) measurement the most 
common type of item format is the graded response format, such as the 
Eikert scale (Dawes, 1972, Eerrando, 2002, Hofstee, Ten Berge and 
Hendricks, 1998). Eurthermore, more continuous formats such as line 
segments, feeling thermometers, visual analogues, etc. are more and more 
used with the increasing use of computerized administration (Eerrando, 
2002). By far, the most common model for analyzing this type of items is 
linear factor analysis (EA). Although, EA is a model for continuous- 
unlimited variables and can not be strictly correct for item responses (who 
are to a greater or lesser degree discrete and bounded) I take the position 
that EA is so widely used because it behaves as a reasonably good 
approximation in most practical applications (e.g. Atkinson, 1988, Hofstee, 
Ten Berge and Hendricks, 1998). 

Most applied psychometricians tend to consider EA as disconnected 
from IRT models. However several authors have emphasized the 
correspondence between them. In particular, McDonald (e.g. 1999) 
provided a general framework in which EA and most IRT models are 
particular cases of a general latent trait model described by a strong 
principle of local independence. This framework is generally used for 
treating IRT models as nonlinear EA models. In this paper, however, we 
shall take the reciprocal approach and we shall consider multiple EA as a 
MIRT model. By using this approach, we shall show that Ackerman’s 
developments for maximizing information in a linear composite become 
particularly simple in the EA case. Next we shall extend the ME approach 
to determine the linear composite that maximizes validity with respect to an 
external variable. Einally the results derived will be illustrated using two 
empirical examples in personality measurement. 
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The Basic Results 

In this section we shall present the general results that are needed for 
the developments in the following sections. The results will be described 
using both scalar and matrix notation. For the sake of simplicity and 
didacticism we shall present the scalar results in relation to the 
bidimensional FA model. The matrix results are generalizations which are 
valid for any number of dimensions. 

For a respondent i and an item j, the structural model for the multiple 

FA is 


^ij - + ^v2^,-2 + ^ij • (1) 

In matrix notation: 


X| = P + A0j +£; 


( 2 ) 


For fixed 0, the item responses are distributed independently (local 
independence). The conditional distribution is assumed to be normal, with 
mean and variance given by 

E{X^\9) = m.+^.9, ; Var{X.\9) = al (3) 

were the conditional mean is the linear item response function of the model. 
These results imply the well-known covariance structure 

L = AOA’+T (4) 

were O is the inter-factor correlation matrix, and \\i is the diagonal matrix 
containing the residual variances . 

Under the assumption of conditional normality above, it follows that 
the MLEs of the trait levels for respondent i are the Bartlett estimated factor 
scores (Mulaik, 1972). In matrix notation they are obtained as 


0, =(A’T'-'A)-'A’T'-‘(Xi -p) 


( 5 ) 
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Now, by using standard procedures, the information matrix (Kendall 
and Stuart, 1976, ehap. 18) is found to be 



d^LtiL 

d^LtiL 


r 72 

^ A 


i{e„e^) = -E 

del 

d^LtiL 

dO^dOr^ 

d^LtiL 

= 

Zj ^2 

M ^si 

_. 7=1 

2-1 ^2 

7=1 

7=1 


dO^dO^ 

del 1 



And the asymptotie eovarianee (ACOV) matrix of the MLEs of 0 is 
given by the inverse of I (we shall assume that I is positive definite). In 
partieular, the square roots of the values in the main diagonal of ACOV are 
the asymptotie SEM of the faetor seore estimates for the eorresponding 
faetor. The key result for the present proeedures is that the elements of I, 
and so of ACOV do not depend on 0, This means that the SEM of eaeh set 
of factor scores is constant. Einally, we note that asymptotic in this context 
means as the number of items beeome arbitrarily large. 

Maximizing Information 

Consider now the problem of finding the linear eomposite of the 
faetors that is measured with the maximum aeeuracy that ean be attained. 
We ean define this eomposite in a given angular direetion of the 0 spaee, by 
eonsidering the weights of the eomposite as the direetion eosines 

0c,=PA+p202 (7) 


so that 7 + y0^2=l- Eor any number of factors and in matrix notation: 


=b’0 , b’b = l 


( 8 ) 


By standard ME theory, it follows that the MEE of 6 d is the linear 
composite of the MLEs of 6 j and 62 (Kendall and Stuart, 1976). 
Asymptotieally its eonditional varianee is given by the quadratie form 


Var0^, \e„e^) = PlVar{6, \e„9^) + PlVar{6^ I , ^ 2 ) + 


( 9 ) 
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where the varianees and eovarianees are the eorresponding elements of 
ACOV. In matrix notation, the quadratie form is 

Var0^, 1 0) = b' ACOV b (10) 


and the inverse of the sealar quantity (10) is Aekerman’s (1994) 
multidimensional information measure. The weights that maximize the 
information of the linear eomposite are thus those that minimize the 
eonditional varianee in (10), i.e. the SEM of the MLE of the linear 
eomposite. The eonditions for obtaining the weights are 

Minimize : b' ACOV b subject to: b'b = l. (11) 


This is a standard problem of finding the extrema of a quadratie form 
subjeet to restrietions (e.g. Basilevsky, 1983). The veetor of weights b is the 
eigenveetor of ACOV assoeiated to the smallest eigenvalue, and this 
eigenvalue is the eonditional varianee of the MEE of the linear eomposite. 

It is of interest to eompare this simple result with the results obtained 
in nonlinear standard MIRT models. Beeause in linear AE the elements of 
ACOV do not depend on 0, onee a linear eomposite in a given direetion has 
been speeified, the MEE of this eomposite will have a eonstant SEM. In 
eontrast, in standard MIRT models the elements of ACOV do depend on 0. 
So, even when a direetion has been defined, the SEM of the MLE will be 
different at eaeh point along this direetion. This means, in turn, that the 
direetion of maximum information must be obtained at eaeh point in the 0 
spaee. Usually maximum information in standard MIRT models ean only be 
assessed using graphieal proeedures sueh as elamshell plots or direetional 
plots (Aekerman, 1994, Reekase and MeKinley, 1991). 

In bidimensional and tridimensional EA solutions the results so far 
obtained ean also be graphieally represented in a form similar to the 
information plots used with nonlinear MIRT models. Eor the bidimensional 
ease, 6 i and 62 are taken as eoordinate axes, and the linear eomposite is 
represented as a veetor in this eoordinate system, with direetion given by 
the direetion eosines, and length equal to the multimensional information 
(or some suitable funetion of it). This plot allows us to assess visually: (a) 
the direetion in the 0 spaee of the single test seore that provides the best 
measurement, (the direetion of the veetor) and (b) the preeision of this 
measurement (the length of the veetor). 
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Maximizing External Validity 

Consider now an external variable y (e.g. a eriterion or a different 
test) that is theoretieally related to 6*/ and 62 . We want to obtain the linear 
composite that maximizes the value of the validity coefficient with respect 
toy. 


6^y = aj6*j + «2^2 • 


( 12 ) 


So that the product-moment correlation between y and 6 cv is as large 
as possible. If the ‘true’ trait levels or factor scores were known, this would 
be a standard multiple regression problem. However, the true trait levels are 
not known, and what we have in this case are the MLEs of 61 and 62 , (i.e. 
Bartlett’s factor scores). We can consider these estimates as proxies for the 
‘true’ trait levels, and for a generic factor g, write 

«.=e,+a,,. (13) 

If the MLE estimates in (13) are used in place of the unknown true 
values, the situation becomes an error-in-variables multiple regression. If 
the problem of substituting proxies for true values is ignored and standard 
multiple regression procedures are used, the estimated weights are likely to 
be biased, and the direction of the bias is unpredictable (Bohrnstedt and 
Carter, 1971). 

Error-in-variables estimation in multiple regression is complex, but it 
becomes notably simplified if the proxies are well-behaved and met some 
desirable conditions (Johnston, 1972 chap. 9). In the present context, these 
conditions are: (a) The measurement errors <% have zero expectation, (b) 
The measurement errors <% and the true levels 6 g are linearly independent, 
and (c) Var((%) is known. We shall now show that, asymptotically, the ME 
factor scores met the three conditions. 

Eirst, MLE are asymptotically unbiased (Kendall and Stuart, 1976). 
So, from (13) we have 


E{d^\e^) = e^ so, E{o)^\e^) = ^. (i4) 


And it follows that the marginal expectation of the errors is zero, 
which is condition (a). Eurthermore, if the errors have zero expectation at 
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any trait level, the regression slope of cOg on 6g is flat, whieh implies that cOg 
and 6g are linearly independent (eondition 2). Finally the variances (and 
covariances) of the error terms are known (condition 3): they are the 
corresponding elements of ACOV. Now, if we denote by Sml the 
covariance matrix between the estimated factor scores, and by SMLy the 
vector containing the covariances between the estimated factor scores andy, 
then, the error-in-variables unbiased estimate of a, (i.e. the vector of 
weights that maximize validity), is given by (Johnston, 1972). 

a = (Sml- ACOV)-' (15) 


As in the previous section, the linear composite that maximizes 
validity can be represented as a vector in a given direction in the 0 space. In 
this case, however, the length of the vector depends on the characteristics of 
the external variable y. What I propose is to scale this vector using the same 
criteria as in the previous section, so that it can be represented together with 
the linear composite that maximizes information. More in detail, I propose 
to represent the maximum validity composite as a vector in the same 
coordinate system as above, with the obtained direction, and with length 
equal to the value of the multidimensional information provided in this 
direction. The plot which displays simultaneously both vectors: maximal 
information and maximal validity is essentially a clamshell plot, and 
provides us clear and quick information regarding (a) the extent to what the 
directions of measurement for the composite that maximizes information 
and the composite that maximizes validity are similar, and (b) the amount of 
precision of both sets of scores. 

Illustrative Examples 

The results so far discussed are illustrated using two empirical 
examples in the personality domain. The first example used a Spanish 
version of Buss and Perry’s (1992) aggression questionnaire (AQ; Andreu 
et ah, 2002, Morales-Vives et al. 2005, Condon et al. 2006) as the main 
measure. The AQ is a multidimensional questionnaire made up of 5 -point 
Likert items that measures four related dimensions of aggression. For the 
present illustration I chose the subscales that measure physical aggression 
(PA, 9 items), and anger (AN, 6 items). As external measure I used the 
dysfunctional impulsivity (DI) scale scores of the Spanish version of 
Dickman’s impulsivity inventory (Chico et al. 2003). According to the 
theory, DI is positively related to aggression (Vigil-Colet and Codorniu, 
2004). The measures were administered to a sample of 241 secondary 
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school students between 12 and 17 years old. Data was kindly supplied by 
Dr. A. Vigil. 

The bidimensional FA model in (1) was fitted to the 15 AQ items 
using the Joreskog-Howe speeifieations (Howe, 1955, Joreskog, 1979). 
Item 1 was used as a marker for PA and item 10 as a marker for AN. With 
this speeifioation the estimated interfaetor eorrelation was ^z^O.47. The 
model was fitted using robust ML estimation as implemented in the Mplus 
program (Muthen and Muthen, 1999). The fit was reasonably good. The 
root mean squared error of approximation (RMSEA) and its 90% 
eonfidenee interval were 0.04 and (0.03; 0.06), respeetively, and the Non- 
Normed Fit index was 0.97. The parameter estimates of the model are 
shown in table 1 

As table 1 shows, the solution is far from approaehing an 
independent-eluster strueture. However, the faetors ean be reasonably well 
distinguished, and generally tend to agree with the subseale labels. The first 
faetor, whieh is mainly defined by the PA items, is defined by more 
indieators and has a elearer strueture. The seeond faetor, whieh is more 
related to the AN items, is less well defined and has a more eomplex 
strueture. 

Next 1 used the results in table 1 and obtained the information matrix 
aecording to (6) and its inverse the ACOV. The matriees are shown below 



“7.42 

0.75“ 


“ 0.14 

-0.05“ 

1 = 



; ACOV = 




0.75 

1.89 


-0.05 

0.55 


From the results above it follows that the SEM for the MLE of 6i and 
O 2 are respeetively: sqrt{0.\A)=031 , and sqrt{0.55)=0.1A. Clearly, the 
estimated faetor seores in the PA factor are far more accurate than the AN 
faetor seores, whieh agrees with the solution in table 1 . The eigenvalues of 
ACOV are 0.13 and 0.56, and the eigenveetor assoeiated to the smallest 
eigenvalue is [0.99, 0.13]. So, the linear eomposite that maximizes 
information in this example is 

=0.996, +0.136*2 


The eonditional varianee is the value of the smallest eigenvalue: 0.13, 
and the SEM of the MLE of the composite is sqrt{0.\3)=036, whieh is 



Multidimensional Item Response Theory 


197 


smaller than the SEM of eaeh individual MLE. Ackerman’s 
multidimensional information is 1/0.13=7.69. 

Eigure 1 shows the maximum- information linear composite (labelled 
as MaX I). It is represented by a vector that starts at the origin and whose 
length is the square root of the multidimensional information: 
sqrt{l .69)=2. 11 . As expected, the vector tend to fall more along the 6i axis, 
lying in a narrow angle of about T with respect to this axis. Overall, the 
contribution of the more precise PA factor to the linear composite that 
maximizes information is far larger than the contribution of the AN factor. 
In other words, the composite resembles far more the PA factor than the AN 
factor. 


Table 1. Bidimensional factor solution for the AQ items. First example 


Item 

01 

02 

2 

a 

PAl 

0.80 

0.00 

1 

PA2 

1.16 

-0.40 

0.78 

PA3 

0.95 

-0.26 

1.39 

PA4 

0.60 

-0.02 

0.83 

PA5 

1.05 

-0.33 

1.08 

PA6 

1.08 

-0.17 

0.81 

PA7 

-0.44 

0.65 

1.82 

PAS 

0.67 

-0.03 

0.92 

PA9 

0.65 

0.22 

1.34 

ANl 

0.00 

0.66 

1.82 

AN2 

0.31 

0.21 

1.39 

AN3 

0.36 

0.66 

1.01 

AN4 

0.32 

-0.22 

1.61 

AN5 

0.38 

0.39 

0.88 

AN6 

0.42 

0.58 

1.04 


We turn now to validity results. The weights for the linear composite 
that maximizes validity with respect to DI scores were obtained by using 
equation (15) and found to be 0.77 and 0.47. So, the linear composite that 
maximizes validity in this example is 

e^y = 0.11 e, + 0 . 476>2 


Using these weights, a respectable validity coefficient of 0.39 was 
obtained. The disattenuated validity coefficient by correcting only the 
MLE’s (i.e. the estimated validity if the unknown true factor scores were 
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used instead of Bartlett’s estimates) was 0.42. For the sake of eomparison, 
the validity eoeffieients obtained with the separate factor scores were 0.35 
(PA factor) and 0.26 (AN factor). 

The Max Val vector in Figure 1 shows the maximum-validity linear 
composite. It is noted that the direction of measurement of the two vectors 
is not the same. They are separated by an angle of about 23°. The length of 
the Max Val is the square root of the multidimensional information in this 
direction, and its value is 2.22, which is sensibly smaller than the length of 
the Max I vector. Overall, the results show that (a) if validity with respect to 
DI is to be maximized, the contribution of both factors must be now more 
balanced, and (b) because both vectors have different directions, to 
maximize validity implies to lose some precision with respect to the 
maximum attainable precision, (this point can be assessed by comparing the 
lengths of both vectors). 



Figure 1. Graphical representation of the linear composites. First 
example. 
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The second empirical example illustrates the used of some of the 
present procedures in the tridimensional case. The used measure was a 
Spanish version of the Perceived Stress Scale (PSS; Cohen, Kamarck and 
Mermelstein, 1983), a theoretically unidimensional inventory made up of 
14 items with a 5 -point Likert format that measures the degree to which 
situations in one’s life are perceived as stressful. The PSS was administered 
to a group of 203 undergraduate students. 

The aim of the original research was to assess the potential impact of 
acquiescence and social desirability (SD) response bias on the PSS scores. 
To do so, an orthogonal unrestricted solution in three factors (content, 
acquiescence and social desirability) was obtained by using a procedure 
developed by Ferrando, Lorenzo-Seva and Chico (2007). Details about the 
procedure, goodness of fit, and resulting orthogonal solution are given in 
the referred article. 

In a well-designed instrument, the impact of the response biases 
would be low, and the scores would mainly reflect the content dimension 
that the test intends to measure. Here we shall show how the present 
procedures can be useful for addressing this point. By using the FA solution 
in three dimensions, the estimated information matrix and the ACOV were 
found to be: 



“ 7.75 

-0.30 

-0.33" 


“0.14 

0.01 

0.27 “ 

I = 

-0.30 

2.95 

0.06 

; ACOV = 

0.01 

0.34 

-0.10 


-0.33 

0.06 

0.17 


0.27 

-0.10 

6.42 


where the dimensions are given in the order: content, acquiescence and SD. 
From the results above it follows that the SEM for the MLEs are: 
5^A(0.14)=0.37, (content), 5^rt(0.34)=0.58 (acquiescence), and 
sqrt{6A2)=2.53 (SD). The estimated factor scores in the content factor 
(perceived stress) are thus far more accurate than the response-bias factor 
scores, which is a good result. In particular, the low information provided 
by the social desirability scores suggests that the impact of SD on the PSS is 
very low. 

The smallest eigenvalue of ACOV is 0.12, and the corresponding 
eigenvector is [0.99, -0.06, -0.04]. So, the linear composite that maximizes 
information in this second example is 

^./= 0 . 99 ^„ - 0 . 06 ^,,^ - 0 . 04 ^,^ 
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The conditional variance is the value of the smallest eigenvalue: 0.12, 
and the SEM of the MLE of the composite is sqrt{0.\2)=035, which is only 
slightly smaller than the SEM of the content scores. Ackerman’s 
multidimensional information is 1/0.12=8.33. 

Eigure 2 shows the maximum- information linear composite as a 
vector in the tridimensional space. Eor simplicity the vector has been 
represented in the positive sector and, as in the first example, its length is 
the square root of the multidimensional information: sqrt{S33)=2.SS. The 
vector is almost collinear with the content axis, and lies in a narrow angle of 
about 4° with respect to this axis. So, the linear composite that maximizes 
information is almost identical to the content factor, and this suggests that 
the impact of the response biases on the PSS scores is very low. The most 
accurate scores that can be obtained from the PSS solution are almost pure 
measures of the content dimension. 

Discussion 

This paper treats multiple PA as a linear MIRT model, and develops 
procedures for determining a single test score, obtained as a linear 
composite of the individual factor scores, which is optimal in some sense. 
In particular I consider two optimization criteria: maximizing information 
and maximizing validity with respect to an external variable. It is found 
that, for the multiple PA model and for MLE of the factor scores (i.e. 
Bartlett’s scores) both maximization problems have a closed, relatively 
simple solution. Purthermore, I propose a graphical representation 
procedure (essentially a clamshell plot) that can be useful in applications. 
Prom an applied point of view, the composite that maximizes information 
will possibly be more useful in individual assessment, when what is needed 
is a single score that discriminates among respondents as accurately as 
possible (perhaps for classification purposes). The maximum validity 
composite will be more useful in criterion-related studies, both for making 
individual predictions and for assessing the relative importance of the 
individual trait levels with respect to the criterion. 

As far as the writer knows, the developments proposed here are 
original. However it is indeed acknowledged that they are based on well- 
known results. It is well known that PA can be treated as a linear IRT 
model, and also well-known are the asymptotic properties of MLE of the 
trait levels in the context of IRT models. The application of Ackerman’s 
multidimensional information approach to the PA case is quite direct, and 
so is the adaptation of the graphical procedures. On the other hand the 
validity results are obtained by combining standard error-in-variables- 
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regression results with standard asymptotie results on ML estimation. 
However, in spite that all of the proposals are relatively straightforward, I 
have been unable to find a single study in whieh the general proeedures 
proposed here were proposed or used. 



Figure 2. Graphical representation of the maximum-information linear 
composite. Second example. 
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All the procedures I propose in the article can be computed from a 
standard FA output by using simple routines written in matrix programs 
such as MATLAB (1999). However, for the present proposal to have a 
minimal chance to be used in applications, an user-friendly program that 
implements all the procedures including also the graphic displays must be 
available for the applied researchers. This is an objective for future 
research. 


RESUMEN 

Combinaciones lineales que maximizan la informacion y la validez en el 
modelo factorial para items continuos. En este articulo se desarrollan 
procedimientos para obtener combinaciones lineales de las puntuaciones 
factoriales que maximizen: (a) la informacion del test y (b) la validez 
externa en el modelo de analisis factorial multiple. El modelo factorial se 
considera como un modelo multidimensional de teoria de respuesta al item, 
y esto permite utilizar el enfoque de Ackerman para la medida de 
informacion multidimensional basada en la estimacion maximo verosimil de 
los niveles en el rasgo. Dicho enfoque lleva a resultados notablemente 
sencillos en lo que respecta a maximizar la informacion. Los resultados y 
procedimientos relacionados con la maximizacion de la validez externa 
parecen ser nuevos, y se obtienen desde el enfoque de la regresion con 
errores en las variables. Se proponen procedimientos para representar 
graficamente los resultados, y se presentan dos ejemplos empiricos para 
ilustrar la metodologia propuesta. 
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